AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

arsenm · 2025-03-04T02:13:19Z

We can fold frame indexes directly into existing immediate operands,
just like is already done for s_add_i32. We happen to use s_add_i32 in
the 32-bit add case, but s_add_u32 appears in the a 64-bit add sequence
of a flat pointer if an addrpacecast source is a frame index.

This avoids, but does not address a failure exposed after
a316539 where two literal operands
end up in the final instruction. The underlying issue still exists for
some instructions without special handling in eliminateFrameIndex.

arsenm · 2025-03-04T02:13:40Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-03-04T02:13:53Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

We can fold frame indexes directly into existing immediate operands,
just like is already done for s_add_i32. We happen to use s_add_i32 in
the 32-bit add case, but s_add_u32 appears in the a 64-bit add sequence
of a flat pointer if an addrpacecast source is a frame index.

This avoids, but does not address a failure exposed after
a316539 where two literal operands
end up in the final instruction. The underlying issue still exists for
some instructions without special handling in eliminateFrameIndex.

Patch is 30.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129628.diff

6 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+3-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll (+2-3)
(added) llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir (+123)
(modified) llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll (+21-57)
(modified) llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll (+28)
(modified) llvm/test/CodeGen/AMDGPU/frame-index.mir (+2-2)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 7d6990c097774..128cd8244a477 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -2713,7 +2713,8 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
 
       return true;
     }
-    case AMDGPU::S_ADD_I32: {
+    case AMDGPU::S_ADD_I32:
+    case AMDGPU::S_ADD_U32: {
       // TODO: Handle s_or_b32, s_and_b32.
       unsigned OtherOpIdx = FIOperandNum == 1 ? 2 : 1;
       MachineOperand &OtherOp = MI->getOperand(OtherOpIdx);
@@ -2773,7 +2774,7 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
           DstReg = TmpReg;
         }
 
-        auto AddI32 = BuildMI(*MBB, *MI, DL, TII->get(AMDGPU::S_ADD_I32))
+        auto AddI32 = BuildMI(*MBB, *MI, DL, MI->getDesc())
                           .addDef(DstReg, RegState::Renamable)
                           .addReg(MaterializedReg, RegState::Kill)
                           .add(OtherOp);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll
index c3b48b5d2ddff..378c6312c52be 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll
@@ -142,13 +142,12 @@ define amdgpu_kernel void @v_insert_v64i32_varidx(ptr addrspace(1) %out.ptr, ptr
 ; GCN-NEXT:    v_mov_b32_e32 v0, s48
 ; GCN-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:240
 ; GCN-NEXT:    v_mov_b32_e32 v0, s49
-; GCN-NEXT:    s_and_b32 s4, s25, 63
 ; GCN-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:244
 ; GCN-NEXT:    v_mov_b32_e32 v0, s50
-; GCN-NEXT:    s_lshl_b32 s4, s4, 2
+; GCN-NEXT:    s_and_b32 s4, s25, 63
 ; GCN-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:248
 ; GCN-NEXT:    v_mov_b32_e32 v0, s51
-; GCN-NEXT:    s_add_u32 s4, 0, s4
+; GCN-NEXT:    s_lshl_b32 s4, s4, 2
 ; GCN-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:252
 ; GCN-NEXT:    v_mov_b32_e32 v0, s24
 ; GCN-NEXT:    v_mov_b32_e32 v1, s4
diff --git a/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir b/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir
new file mode 100644
index 0000000000000..af61bd70f16b6
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir
@@ -0,0 +1,123 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=MUBUFW64 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=MUBUFW64 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=MUBUFW64 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=MUBUFW64 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=MUBUFW32 %s
+
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=FLATSCRW64 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=FLATSCRW32 %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -verify-machineinstrs -run-pass=prologepilog %s -o - | FileCheck -check-prefix=FLATSCRW32 %s
+
+---
+name: s_add_u32__inline_imm__fi_offset0
+tracksRegLiveness: true
+stack:
+  - { id: 0, size: 32, alignment: 16 }
+machineFunctionInfo:
+  scratchRSrcReg:  '$sgpr0_sgpr1_sgpr2_sgpr3'
+  frameOffsetReg:  '$sgpr33'
+  stackPtrOffsetReg: '$sgpr32'
+body:             |
+  bb.0:
+    ; MUBUFW64-LABEL: name: s_add_u32__inline_imm__fi_offset0
+    ; MUBUFW64: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
+    ; MUBUFW64-NEXT: renamable $sgpr7 = S_ADD_U32 12, $sgpr4, implicit-def dead $scc
+    ; MUBUFW64-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; MUBUFW32-LABEL: name: s_add_u32__inline_imm__fi_offset0
+    ; MUBUFW32: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 5, implicit-def dead $scc
+    ; MUBUFW32-NEXT: renamable $sgpr7 = S_ADD_U32 12, $sgpr4, implicit-def dead $scc
+    ; MUBUFW32-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; FLATSCRW64-LABEL: name: s_add_u32__inline_imm__fi_offset0
+    ; FLATSCRW64: renamable $sgpr7 = S_ADD_U32 12, $sgpr32, implicit-def dead $scc
+    ; FLATSCRW64-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; FLATSCRW32-LABEL: name: s_add_u32__inline_imm__fi_offset0
+    ; FLATSCRW32: renamable $sgpr7 = S_ADD_U32 12, $sgpr32, implicit-def dead $scc
+    ; FLATSCRW32-NEXT: SI_RETURN implicit $sgpr7
+    renamable $sgpr7 = S_ADD_U32 12, %stack.0, implicit-def dead $scc
+    SI_RETURN implicit $sgpr7
+
+...
+
+---
+name: s_add_u32__kernel__literal__fi_offset96__offset_literal
+tracksRegLiveness: true
+stack:
+  - { id: 0, size: 96, alignment: 16 }
+  - { id: 1, size: 128, alignment: 4 }
+machineFunctionInfo:
+  scratchRSrcReg:  '$sgpr0_sgpr1_sgpr2_sgpr3'
+  frameOffsetReg:  '$sgpr33'
+  stackPtrOffsetReg: '$sgpr32'
+  isEntryFunction: true
+body:             |
+  bb.0:
+    ; MUBUFW64-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal
+    ; MUBUFW64: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: {{  $}}
+    ; MUBUFW64-NEXT: $sgpr0 = S_ADD_U32 $sgpr0, $noreg, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: renamable $sgpr7 = S_MOV_B32 164
+    ; MUBUFW64-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; MUBUFW32-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal
+    ; MUBUFW32: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: {{  $}}
+    ; MUBUFW32-NEXT: $sgpr0 = S_ADD_U32 $sgpr0, $noreg, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: renamable $sgpr7 = S_MOV_B32 164
+    ; MUBUFW32-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; FLATSCRW64-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal
+    ; FLATSCRW64: renamable $sgpr7 = S_MOV_B32 164
+    ; FLATSCRW64-NEXT: SI_RETURN implicit $sgpr7
+    ;
+    ; FLATSCRW32-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal
+    ; FLATSCRW32: renamable $sgpr7 = S_MOV_B32 164
+    ; FLATSCRW32-NEXT: SI_RETURN implicit $sgpr7
+    renamable $sgpr7 = S_ADD_U32 68, %stack.1, implicit-def dead $scc
+    SI_RETURN implicit $sgpr7
+...
+
+---
+name: s_add_u32__kernel__literal__fi_offset96__offset_literal_live_scc
+tracksRegLiveness: true
+stack:
+  - { id: 0, size: 96, alignment: 16 }
+  - { id: 1, size: 128, alignment: 4 }
+machineFunctionInfo:
+  scratchRSrcReg:  '$sgpr0_sgpr1_sgpr2_sgpr3'
+  frameOffsetReg:  '$sgpr33'
+  stackPtrOffsetReg: '$sgpr32'
+  isEntryFunction: true
+body:             |
+  bb.0:
+    ; MUBUFW64-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal_live_scc
+    ; MUBUFW64: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: {{  $}}
+    ; MUBUFW64-NEXT: $sgpr0 = S_ADD_U32 $sgpr0, $noreg, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW64-NEXT: renamable $sgpr7 = S_ADD_U32 164, 0, implicit-def $scc
+    ; MUBUFW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
+    ;
+    ; MUBUFW32-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal_live_scc
+    ; MUBUFW32: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: {{  $}}
+    ; MUBUFW32-NEXT: $sgpr0 = S_ADD_U32 $sgpr0, $noreg, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
+    ; MUBUFW32-NEXT: renamable $sgpr7 = S_ADD_U32 164, 0, implicit-def $scc
+    ; MUBUFW32-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
+    ;
+    ; FLATSCRW64-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal_live_scc
+    ; FLATSCRW64: renamable $sgpr7 = S_ADD_U32 164, 0, implicit-def $scc
+    ; FLATSCRW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
+    ;
+    ; FLATSCRW32-LABEL: name: s_add_u32__kernel__literal__fi_offset96__offset_literal_live_scc
+    ; FLATSCRW32: renamable $sgpr7 = S_ADD_U32 164, 0, implicit-def $scc
+    ; FLATSCRW32-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
+    renamable $sgpr7 = S_ADD_U32 68, %stack.1, implicit-def $scc
+    SI_RETURN implicit $sgpr7, implicit $scc
+...
diff --git a/llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll b/llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll
index 346b69c362c04..96d0e383761d1 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll
@@ -38,7 +38,6 @@ define amdgpu_kernel void @soff1_voff1(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v3, 2, v0
@@ -76,12 +75,9 @@ define amdgpu_kernel void @soff1_voff1(i32 %soff) {
 ; GFX11-GISEL:       ; %bb.0: ; %bb
 ; GFX11-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
-; GFX11-GISEL-NEXT:    v_mov_b32_e32 v3, 4
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instid1(SALU_CYCLE_1)
-; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_dual_mov_b32 v3, 4 :: v_dual_add_nc_u32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_add_nc_u32 v5, 2, v0
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v4, 1, v0
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, 4, v0
@@ -113,8 +109,7 @@ define amdgpu_kernel void @soff1_voff1(i32 %soff) {
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_mov_b32 v3, 4
 ; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
-; GFX12-GISEL-NEXT:    s_add_co_u32 s0, 0, s0
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instid1(SALU_CYCLE_1)
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2)
 ; GFX12-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX12-GISEL-NEXT:    scratch_store_b8 v0, v1, off offset:1 scope:SCOPE_SYS
 ; GFX12-GISEL-NEXT:    s_wait_storecnt 0x0
@@ -168,7 +163,6 @@ define amdgpu_kernel void @soff1_voff2(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_lshlrev_b32_e32 v0, 1, v0
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    scratch_store_byte v2, v1, off sc0 sc1
@@ -207,11 +201,9 @@ define amdgpu_kernel void @soff1_voff2(i32 %soff) {
 ; GFX11-GISEL:       ; %bb.0: ; %bb
 ; GFX11-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v3, 4 :: v_dual_lshlrev_b32 v0, 1, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_add_nc_u32 v5, 2, v0
@@ -246,11 +238,9 @@ define amdgpu_kernel void @soff1_voff2(i32 %soff) {
 ; GFX12-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_mov_b32 v3, 4
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2)
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
 ; GFX12-GISEL-NEXT:    v_lshlrev_b32_e32 v0, 1, v0
 ; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
-; GFX12-GISEL-NEXT:    s_add_co_u32 s0, 0, s0
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX12-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX12-GISEL-NEXT:    scratch_store_b8 v0, v1, off offset:1 scope:SCOPE_SYS
 ; GFX12-GISEL-NEXT:    s_wait_storecnt 0x0
@@ -304,7 +294,6 @@ define amdgpu_kernel void @soff1_voff4(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    scratch_store_byte v2, v1, off sc0 sc1
@@ -343,11 +332,9 @@ define amdgpu_kernel void @soff1_voff4(i32 %soff) {
 ; GFX11-GISEL:       ; %bb.0: ; %bb
 ; GFX11-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v3, 4 :: v_dual_lshlrev_b32 v0, 2, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_add_nc_u32 v5, 2, v0
@@ -382,11 +369,9 @@ define amdgpu_kernel void @soff1_voff4(i32 %soff) {
 ; GFX12-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_mov_b32 v3, 4
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2)
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
 ; GFX12-GISEL-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
-; GFX12-GISEL-NEXT:    s_add_co_u32 s0, 0, s0
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX12-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX12-GISEL-NEXT:    scratch_store_b8 v0, v1, off offset:1 scope:SCOPE_SYS
 ; GFX12-GISEL-NEXT:    s_wait_storecnt 0x0
@@ -440,7 +425,6 @@ define amdgpu_kernel void @soff2_voff1(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX942-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v3, 2, v0
@@ -483,8 +467,7 @@ define amdgpu_kernel void @soff2_voff1(i32 %soff) {
 ; GFX11-GISEL-NEXT:    v_mov_b32_e32 v3, 4
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instid1(SALU_CYCLE_1)
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_add_nc_u32 v5, 2, v0
@@ -520,8 +503,7 @@ define amdgpu_kernel void @soff2_voff1(i32 %soff) {
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_mov_b32 v3, 4
 ; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
-; GFX12-GISEL-NEXT:    s_add_co_u32 s0, 0, s0
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instid1(SALU_CYCLE_1)
 ; GFX12-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX12-GISEL-NEXT:    scratch_store_b8 v0, v1, off offset:1 scope:SCOPE_SYS
 ; GFX12-GISEL-NEXT:    s_wait_storecnt 0x0
@@ -576,7 +558,6 @@ define amdgpu_kernel void @soff2_voff2(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX942-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    scratch_store_byte v2, v1, off sc0 sc1
@@ -616,11 +597,10 @@ define amdgpu_kernel void @soff2_voff2(i32 %soff) {
 ; GFX11-GISEL:       ; %bb.0: ; %bb
 ; GFX11-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(SALU_CYCLE_1)
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v3, 4 :: v_dual_lshlrev_b32 v0, 1, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -657,11 +637,10 @@ define amdgpu_kernel void @soff2_voff2(i32 %soff) {
 ; GFX12-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v2, 2 :: v_dual_mov_b32 v3, 4
-; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(SALU_CYCLE_1)
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2)
 ; GFX12-GISEL-NEXT:    v_lshlrev_b32_e32 v0, 1, v0
 ; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX12-GISEL-NEXT:    s_add_co_u32 s0, 0, s0
 ; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX12-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX12-GISEL-NEXT:    scratch_store_b8 v0, v1, off offset:1 scope:SCOPE_SYS
@@ -717,7 +696,6 @@ define amdgpu_kernel void @soff2_voff4(i32 %soff) {
 ; GFX942-GISEL-NEXT:    v_mov_b32_e32 v1, 1
 ; GFX942-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX942-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX942-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v0, s0, v0
 ; GFX942-GISEL-NEXT:    v_add_u32_e32 v2, 1, v0
 ; GFX942-GISEL-NEXT:    scratch_store_byte v2, v1, off sc0 sc1
@@ -757,11 +735,10 @@ define amdgpu_kernel void @soff2_voff4(i32 %soff) {
 ; GFX11-GISEL:       ; %bb.0: ; %bb
 ; GFX11-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(SALU_CYCLE_1)
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v3, 4 :: v_dual_lshlrev_b32 v0, 2, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    s_lshl_b32 s0, s0, 1
-; GFX11-GISEL-NEXT:    s_add_u32 s0, 0, s0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
 ; GFX11-GISEL-NEXT:    v_add_nc_u32_e32 v0, s0, v0
 ; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -798,11 +775,10 @@ define amdgpu_kernel void @soff2_voff4(i32 %soff) {
 ; GFX12-GISEL-NEXT:    s_load_b32 s0, s[4:5], 0x24
 ; GFX12-GISEL-NEXT:    v_dual_mov_b32 v1, 1 :: v_dual_and_b32 v0, 0x3ff, v0
 ; GFX12-GISEL-NEXT:...
[truncated]

arsenm · 2025-03-05T01:03:04Z

Merge activity

Mar 4, 8:03 PM EST: A user started a stack merge that includes this pull request via Graphite.
Mar 4, 8:04 PM EST: Graphite rebased this pull request as part of a merge.
Mar 4, 8:07 PM EST: Graphite rebased this pull request as part of a merge.
Mar 4, 8:09 PM EST: A user merged this pull request with Graphite.

We can fold frame indexes directly into existing immediate operands, just like is already done for s_add_i32. We happen to use s_add_i32 in the 32-bit add case, but s_add_u32 appears in the a 64-bit add sequence of a flat pointer if an addrpacecast source is a frame index. This avoids, but does not address a failure exposed after a316539 where two literal operands end up in the final instruction. The underlying issue still exists for some instructions without special handling in eliminateFrameIndex.

llvm-ci · 2025-03-05T01:39:56Z

LLVM Buildbot has detected a new failure on builder clang-armv7-lnt running on linaro-clang-armv7-lnt while building llvm at step 11 "setup lit".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/77/builds/9661

Here is the relevant piece of the build log for the reference

Step 11 (setup lit) failure: '/home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/bin/python /home/tcwg-buildbot/worker/clang-armv7-lnt/test/lnt/setup.py ...' (failure)
running develop
/home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/setuptools/command/easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running egg_info
writing LNT.egg-info/PKG-INFO
writing dependency_links to LNT.egg-info/dependency_links.txt
writing entry points to LNT.egg-info/entry_points.txt
writing requirements to LNT.egg-info/requires.txt
writing top-level names to LNT.egg-info/top_level.txt
reading manifest file 'LNT.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'docs/_build'
no previously-included directories found matching 'tests/*/Output'
no previously-included directories found matching 'tests/*/*/Output'
adding license file 'LICENSE.TXT'
writing manifest file 'LNT.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-armv8l-3.10/lnt/testing/profile/cPerf.cpython-310-arm-linux-gnueabihf.so -> lnt/testing/profile
Creating /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/LNT.egg-link (link to .)
Adding LNT 0.4.2.dev0 to easy-install.pth file
Installing lnt script to /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/bin

Installed /home/tcwg-buildbot/worker/clang-armv7-lnt/test/lnt
Processing dependencies for LNT==0.4.2.dev0
Searching for typing
Reading https://pypi.org/simple/typing/
/home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning:  is an invalid version and will not be supported in a future release
  warnings.warn(
Downloading https://files.pythonhosted.org/packages/f2/5d/865e17349564eb1772688d8afc5e3081a5964c640d64d1d2880ebaed002d/typing-3.10.0.0-py3-none-any.whl#sha256=12fbdfbe7d6cca1a42e485229afcb0b0c8259258cfb919b8a5e2a5c953742f89
Best match: typing 3.10.0.0
Processing typing-3.10.0.0-py3-none-any.whl
Installing typing-3.10.0.0-py3-none-any.whl to /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages
Adding typing 3.10.0.0 to easy-install.pth file

Installed /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/typing-3.10.0.0-py3.10.egg
Searching for six
Reading https://pypi.org/simple/six/
Downloading https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl#sha256=4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274
Best match: six 1.17.0
Processing six-1.17.0-py2.py3-none-any.whl
Installing six-1.17.0-py2.py3-none-any.whl to /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages
Adding six 1.17.0 to easy-install.pth file

Installed /home/tcwg-buildbot/worker/clang-armv7-lnt/test/sandbox/lib/python3.10/site-packages/six-1.17.0-py3.10.egg
Searching for requests
Reading https://pypi.org/simple/requests/
Downloading https://files.pythonhosted.org/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl#sha256=70761cfe03c773ceb22aa2f671b4757976145175cdfca038c02654d061d6dcc6
error: Download error for https://files.pythonhosted.org/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl#sha256=70761cfe03c773ceb22aa2f671b4757976145175cdfca038c02654d061d6dcc6: [Errno -3] Temporary failure in name resolution

arsenm added the backend:AMDGPU label Mar 4, 2025 — with Graphite App

arsenm requested review from Pierre-vh, jayfoad, jofrn, rampitec, ritter-x2a and shiltian March 4, 2025 02:13

arsenm marked this pull request as ready for review March 4, 2025 02:13

llvmbot added the llvm:globalisel label Mar 4, 2025

arsenm mentioned this pull request Mar 4, 2025

AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

Merged

rampitec approved these changes Mar 4, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/eliminateFrameIndex-s-add-u32 branch from d70a17f to 7ef23b8 Compare March 5, 2025 01:04

arsenm force-pushed the users/arsenm/eliminateFrameIndex-s-add-u32 branch from 7ef23b8 to 62bfd5c Compare March 5, 2025 01:06

arsenm merged commit 91aac7c into main Mar 5, 2025
6 of 10 checks passed

arsenm deleted the users/arsenm/eliminateFrameIndex-s-add-u32 branch March 5, 2025 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

Uh oh!

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 4, 2025 •

edited

Loading

Uh oh!

arsenm commented Mar 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

llvm-ci commented Mar 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

Uh oh!

Conversation

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

llvm-ci commented Mar 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

arsenm commented Mar 4, 2025 •

edited

Loading

llvmbot commented Mar 4, 2025 •

edited

Loading

arsenm commented Mar 5, 2025 •

edited

Loading